aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter
نویسندگان
چکیده
In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has put into training due to their numerous parameters a network. Some complex optimizers with hyperparameters utilized accelerate the process network and improve its generalization ability. It often is trial-and-error tune these optimizer. this paper, we analyze different roles samples on parameter update, visually, find that sample contributes differently update. Furthermore, present variant batch stochastic gradient decedent for using ReLU as activation function hidden layers, which called adaptive descent (aSGD). Different from existing methods, it calculates size each model uses mean effective actual updates. Experimental results over MNIST show aSGD can speed up optimization DNN achieve higher accuracy without extra hyperparameters. synthetic datasets redundant nodes effectively, helpful compression.
منابع مشابه
Adaptive Variance Reducing for Stochastic Gradient Descent
Variance Reducing (VR) stochastic methods are fast-converging alternatives to the classical Stochastic Gradient Descent (SGD) for solving large-scale regularized finite sum problems, especially when a highly accurate solution is required. One critical step in VR is the function sampling. State-of-the-art VR algorithms such as SVRG and SAGA, employ either Uniform Probability (UP) or Importance P...
متن کاملConvergence diagnostics for stochastic gradient descent with constant step size
Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...
متن کاملCost-Sensitive Approach to Batch Size Adaptation for Gradient Descent
In this paper we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods. The choice of the batch size induces a trade-off between the accuracy of the gradient estimate and the cost in terms of samples of each update. We propose to determine the batch size by optimizing the ratio between a lower bound to a linear or quadratic Taylor approximatio...
متن کاملA stochastic gradient adaptive filter with gradient adaptive step size
This paper presents an adaptive step-size gradient adaptive filter. The step size of the adaptive filter is changed according to a gradient descent algorithm designed to reduce the squared estimation error during each iteration. An approximate analysis of the performance of the adaptive filter when its inputs are zero mean, white, and Gaussian and the set of optimal coefficients are time varyin...
متن کاملAdaptive wavefront control with asynchronous stochastic parallel gradient descent clusters.
A scalable adaptive optics (AO) control system architecture composed of asynchronous control clusters based on the stochastic parallel gradient descent (SPGD) optimization technique is discussed. It is shown that subdivision of the control channels into asynchronous SPGD clusters improves the AO system performance by better utilizing individual and/or group characteristics of adaptive system co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics
سال: 2022
ISSN: ['2227-7390']
DOI: https://doi.org/10.3390/math10060863